18 research outputs found

    Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data

    Get PDF
    BACKGROUND: Genome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientists that may not be as versed in programing language, but want to perform these operations hands on, there is a lengthy learning curve to utilize the vast number of programs available for these analyses. RESULTS: In an effort to streamline the entire process with easy-to-use steps for scientists working with big data, the Odyssey pipeline was developed. Odyssey is a simplified, efficient, semi-automated genome-wide imputation and analysis pipeline, which prepares raw genetic data, performs pre-imputation quality control, phasing, imputation, post-imputation quality control, population stratification analysis, and genome-wide association with statistical data analysis, including result visualization. Odyssey is a pipeline that integrates programs such as PLINK, SHAPEIT, Eagle, IMPUTE, Minimac, and several R packages, to create a seamless, easy-to-use, and modular workflow controlled via a single user-friendly configuration file. Odyssey was built with compatibility in mind, and thus utilizes the Singularity container solution, which can be run on Linux, MacOS, and Windows platforms. It is also easily scalable from a simple desktop to a High-Performance System (HPS). CONCLUSION: Odyssey facilitates efficient and fast genome-wide association analysis automation and can go from raw genetic data to genome: phenome association visualization and analyses results in 3-8 h on average, depending on the input data, choice of programs within the pipeline and available computer resources. Odyssey was built to be flexible, portable, compatible, scalable, and easy to setup. Biologists less familiar with programing can now work hands on with their own big data using this easy-to-use pipeline

    The RavA-ViaA Chaperone-Like System Interacts with and Modulates the Activity of the Fumarate Reductase Respiratory Complex

    Get PDF
    Regulatory ATPase variant A (RavA) is a MoxR AAA + protein that functions together with a partner protein that we termed VWA interacting with AAA + ATPase (ViaA) containing a von Willebrand Factor A domain. However, the functional role of RavA-ViaA in the cell is not yet well established. Here, we show that RavA-ViaA are functionally associated with anaerobic respiration in Escherichia coli through interactions with the fumarate reductase (Frd) electron transport complex. Expression analysis of ravA and viaA genes showed that both proteins are co-expressed with multiple anaerobic respiratory genes, many of which are regulated by the anaerobic transcriptional regulator Fnr. Consistently, the expression of both ravA and viaA was found to be dependent on Fnr in cells grown under oxygen-limiting condition. ViaA was found to physically interact with FrdA, the flavin-containing subunit of the Frd complex. Both RavA and the Fe–S-containing subunit of the Frd complex, FrdB, regulate this interaction. Importantly, Frd activity was observed to increase in the absence of RavA and ViaA. This indicates that RavA and ViaA modulate the activity of the Frd complex, signifying a potential regulatory chaperone-like function for RavA-ViaA during bacterial anaerobic respiration with fumarate as the terminal electron acceptor

    Prediction and validation of the unexplored RNA-binding protein atlas of the human proteome

    Get PDF
    Detecting protein-RNA interactions is challenging both experimentally and computationally because RNAs are large in number, diverse in cellular location and function, and flexible in structure. As a result, many RNA-binding proteins (RBPs) remain to be identified. Here, a template-based, function-prediction technique SPOT-Seq for RBPs is applied to human proteome and its result is validated by a recent proteomic experimental discovery of 860 mRNA-binding proteins (mRBPs). The coverage (or sensitivity) is 42.6% for 1217 known RBPs annotated in the Gene Ontology and 43.6% for 860 newly discovered human mRBPs. Consistent sensitivity indicates the robust performance of SPOT-Seq for predicting RBPs. More importantly, SPOT-Seq detects 2418 novel RBPs in human proteome, 291 of which were validated by the newly discovered mRBP set. Among 291 validated novel RBPs, 61 are not homologous to any known RBPs. Successful validation of predicted novel RBPs permits us to further analysis of their phenotypic roles in disease pathways. The dataset of 2418 predicted novel RBPs along with confidence levels and complex structures is available at http://sparks-lab.org (in publications) for experimental confirmations and hypothesis generation

    Embryonic ethanol exposure alters expression of sox2 and other early transcripts in zebrafish, producing gastrulation defects

    Get PDF
    Ethanol exposure during prenatal development causes fetal alcohol spectrum disorder (FASD), the most frequent preventable birth defect and neurodevelopmental disability syndrome. The molecular targets of ethanol toxicity during development are poorly understood. Developmental stages surrounding gastrulation are very sensitive to ethanol exposure. To understand the effects of ethanol on early transcripts during embryogenesis, we treated zebrafish embryos with ethanol during pre-gastrulation period and examined the transcripts by Affymetrix GeneChip microarray before gastrulation. We identified 521 significantly dysregulated genes, including 61 transcription factors in ethanol-exposed embryos. Sox2, the key regulator of pluripotency and early development was significantly reduced. Functional annotation analysis showed enrichment in transcription regulation, embryonic axes patterning, and signaling pathways, including Wnt, Notch and retinoic acid. We identified all potential genomic targets of 25 dysregulated transcription factors and compared their interactions with the ethanol-dysregulated genes. This analysis predicted that Sox2 targeted a large number of ethanol-dysregulated genes. A gene regulatory network analysis showed that many of the dysregulated genes are targeted by multiple transcription factors. Injection of sox2 mRNA partially rescued ethanol-induced gene expression, epiboly and gastrulation defects. Additional studies of this ethanol dysregulated network may identify therapeutic targets that coordinately regulate early development

    Splicing factor ESRP1 controls ER-positive breast cancer by altering metabolic pathways

    Get PDF
    The epithelial splicing regulatory proteins 1 and 2 (ESRP1 and ESRP2) control the epithelial-to-mesenchymal transition (EMT) splicing program in cancer. However, their role in breast cancer recurrence is unclear. In this study, we report that high levels of ESRP1, but not ESRP2, are associated with poor prognosis in estrogen receptor positive (ER+) breast tumors. Knockdown of ESRP1 in endocrine-resistant breast cancer models decreases growth significantly and alters the EMT splicing signature, which we confirm using TCGA SpliceSeq data of ER+ BRCA tumors. However, these changes are not accompanied by the development of a mesenchymal phenotype or a change in key EMT-transcription factors. In tamoxifen-resistant cells, knockdown of ESRP1 affects lipid metabolism and oxidoreductase processes, resulting in the decreased expression of fatty acid synthase (FASN), stearoyl-CoA desaturase 1 (SCD1), and phosphoglycerate dehydrogenase (PHGDH) at both the mRNA and protein levels. Furthermore, ESRP1 knockdown increases the basal respiration and spare respiration capacity. This study reports a novel role for ESRP1 that could form the basis for the prevention of tamoxifen resistance in ER+ breast cancer

    Whole-Genome Sequence of Sungri/96 Vaccine Strain of Peste des Petits Ruminants Virus

    Get PDF
    We report the complete genome sequence of the Sungri/96 vaccine strain of peste des petits ruminants virus (PPRV). The whole-genome nucleotide sequence has 89 to 99% identity with the available PPRV genome sequences in the NCBI database. This study helps to understand the epidemiological and molecular characteristics of the Sungri/96 strain

    Integrating Data Science into T32 Training Programs at IUPUI

    Get PDF
    Data science is critically important to the biomedical research enterprise. Many research efforts currently and in the future will employ advanced computational techniques to analyze extremely large datasets in order to discover insights relevant to human health. Therefore the next generation of biomedical scientists requires knowledge of and proficiency in data science. With support from the U.S. National Library of Medicine, a team of faculty from Indiana University-Purdue University Indianapolis (IUPUI) facilitated curricula enhancement for National Institutes of Health (NIH) T32 research training programs with respect to data science. In collaboration with the existing NIH T32 Program Directors at IUPUI and the IU School of Medicine, the interdisciplinary team of faculty drawn from multiple schools and departments examined the existing landscape of data science offerings on campus in parallel with an assessment of the competencies that future biomedical and clinician scientists will require to be comfortable using data science methods to advance their research. The IUPUI campus possesses a rich tapestry of data science education programs across multiple schools and departments. Furthermore, the campus is home to more than a dozen world-class T32 programs funded by the NIH to train biomedical and clinician scientists. However, existing training programs do not currently emphasize data science or provide specific curriculum designed to ensure T32 graduates possess basic competencies in data science. To position the campus for the future, robust T32 programs need to connect with the rapidly growing data science programs. This report summarizes the rationale for the importance of connection and the competencies that future biomedical and clinical scientists will require to be successful. The report further describes the curriculum mapping efforts to link competencies with available degree programs, courses and workshops on campus. The report further recommends next steps for campus leadership, including but not limited to T32 Program Directors, the Office of the Vice Chancellor for Research, the Executive Associate Dean for Research Affairs at the IU School of Medicine, and the President and CEO of the Regenstrief Institute. Together we can strengthen the IUPUI campus and help ensure its T32 graduates are successful in their research careers.National Library of Medicin

    The RNA-Binding Protein Musashi1 Affects Medulloblastoma Growth via a Network of Cancer- Related Genes and Is an Indicator of Poor Prognosis

    Get PDF
    Musashi1 (Msi1) is a highly conserved RNA-binding protein that is required during the development of the nervous system. Msi1 has been characterized as a stem cell marker, controlling the balance between self-renewal and differentiation, and has also been implicated in tumorigenesis, being highly expressed in multiple tumor types. We analyzed Msi1 expression in a large cohort of medulloblastoma samples and found that Msi1 is highly expressed in tumor tissue compared with normal cerebellum. Notably, high Msi1 expression levels proved to be a sign of poor prognosis. Msi1 expression was determined to be particularly high in molecular subgroups 3 and 4 of medulloblastoma. We determined that Msi1 is required for tumorigenesis because inhibition of Msi1 expression by small-interfering RNAs reduced the growth of Daoy medulloblastoma cells in xenografts. To characterize the participation of Msi1 in medulloblastoma, we conducted different high-throughput analyses. Ribonucleoprotein immunoprecipitation followed by microarray analysis (RIP-chip) was used to identify mRNA species preferentially associated with Msi1 protein in Daoy cells. We also used cluster analysis to identify genes with similar or opposite expression patterns to Msi1 in our medulloblastoma cohort. A network study identified RAC1, CTGF, SDCBP, SRC, PRL, and SHC1 as major nodes of an Msi1-associated network. Our results suggest that Msi1 functions as a regulator of multiple processes in medulloblastoma formation and could become an important therapeutic target

    A Screen for RNA-Binding Proteins in Yeast Indicates Dual Functions for Many Enzymes

    Get PDF
    Hundreds of RNA-binding proteins (RBPs) control diverse aspects of post-transcriptional gene regulation. To identify novel and unconventional RBPs, we probed high-density protein microarrays with fluorescently labeled RNA and selected 200 proteins that reproducibly interacted with different types of RNA from budding yeast Saccharomyces cerevisiae. Surprisingly, more than half of these proteins represent previously known enzymes, many of them acting in metabolism, providing opportunities to directly connect intermediary metabolism with posttranscriptional gene regulation. We mapped the RNA targets for 13 proteins identified in this screen and found that they were associated with distinct groups of mRNAs, some of them coding for functionally related proteins. We also found that overexpression of the enzyme Map1 negatively affects the expression of experimentally defined mRNA targets. Our results suggest that many proteins may associate with mRNAs and possibly control their fates, providing dense connections between different layers of cellular regulation

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
    corecore